AI Plagiarism
AI Copyright
The definitive 128-page summary of copyright issues, written by a team at Cornell law school. Talkin’ ’Bout AI Generation: Copyright and the Generative AI Supply Chain
Also see The Verge AI Companies list of reasons against against paying for copyrighted data. Especially note the Fair Use decision of Sega v. Accolade, in which the 9th Circuit concluded that it’s okay to make an intermediate copy of something in order to reverse-engineer it.
“Why I just resigned my job in generative AI” is a counter-argument from Ed Newton-Rex, who quit as VP Audio at Stability AI because he thinks AI-generated content is stealing. > I disagree because one of the factors affecting whether the act of copying is fair use, according to Congress, is “the effect of the use upon the potential market for or value of the copyrighted work” >
Fake News
A 2017 paper, 3HAN: A Deep Neural Network for Fake News Detection claims to detect fake news through some kind of word vector that breaks apart the text. It’s unclear whether it actually flags news it has determined as false, or whether it simply relies on sensationalism clues. #todo Look for more recent work, perhaps by searching for papers that cite this one.
I suppose in theory you could try to detect how different one news item is from the “consensus”.
AI-Generated Content
In early April 2023, I noticed Amazon lists a dozen self-published books by “Lorraine Henwood” uploaded in the past week, including a workbook for “Age of Scientific Wellness”. By May the “workbooks” had been removed except for one workbook about “The Wisdom of Morrie”.
Plagiarism Detection
OpenAI concludes AI writing detectors don’t work
In a section of the FAQ titled “Do AI detectors work?”, OpenAI writes, “In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.”
The NYTimes challenges a bunch of AI Image detectors against real and MidJourney-generated images How Easy Is It to Fool A.I.-Detection Tools?, concluding it’s very hard to tell the difference, especially if an image has been resized or otherwise altered. Conclusion: rely on watermarks.
Alberto Romera at The Algorithmic Bridge:
(If you want to read a more in-depth analysis of how exactly detectors work and fail, I recommend you to check this overview by AI researcher Sebastian Raschka where he reviews the four main types of detectors and explains how they differ. For a hands-on assessment, I loved this article by Benj Edwards on Ars Technica.)
see https://gptzero.substack.com/
currently the app uses a few properties, perplexity (randomness of a text to a model, or how well a language model likes a text); and burstiness (machine written text exhibits more uniform and constant perplexity over time, while human written text varies hosted on Streamlit
Plagiarism Prevention
An open technical standard from Adobe/Microsoft/etc providing publishers, creators, and consumers the ability to trace the origin of different types of media.
OpenAI announced support, plus additional APIs and an upcoming Media Managerthat will make it possible for content creators to opt out.
To drive adoption and understanding of provenance standards - including C2PA - we are joining Microsoft in launching a societal resilience fund(opens in a new window). This $2 million fund will support AI education and understanding, including through organizations like Older Adults Technology Services from AARP(opens in a new window), International IDEA(opens in a new window), and Partnership on AI(opens in a new window).
Adobe calls it “content credentials” and it works by encoding provenance information through a set of hashes that cryptographically bind to each pixel
But Fast Company thinks It may just confuse things more
the smallest adjustments made to real images can be flagged as more questionable than completely made up pictures.
Photoguard developed at MIT (VentureBeat)
Humanize AI Output
BypassGPT claims to adjust any text snippet to make it pass AI plagiarism detectors including CopyLeaks.
My experience
This guy took my DeSci piece, rewrote it slightly, and posted it yesterday to his LinkedIn account of 5000 followers:
https://www.linkedin.com/feed/update/urn:li:activity:6998831186554859520/
500 reactions already
Every paragraph is just a rewrite. I wrote this:
In a DeSci world, the indelible nature of the blockchain closes off many sources of outright fraud. Smart contracts, by eliminating humans from the loop, can’t be bribed or intimidated, for example.
He writes this:
The indelible nature of the blockchain eliminates several sources of blatant #fraud in a #DeSci society. Smart contracts, by removing humans from the loop, cannot be bribed or intimidated.
The whole piece is like this!
2022-12-01 4:33 PM